Simple Sampling Techniques for Discovery Science
نویسنده
چکیده
We explain three random sampling techniques that are simple but widely applicable for various problems involving huge data sets. The first technique is an immediate application of large deviation bounds. The second and the third ones are sequential sampling or adaptive sampling techniques. We fix one simple problem and explain these techniques by demonstrating algorithms for this problem and discussing their correctness and efficiency. key words: random sampling, the Chernoff bound, the Hoeffding bound, the Central Limit Theorem, sequential sampling, adaptive sampling
منابع مشابه
Science Contribute to Knowledge Discovery ?
Knowledge discovery, that is, to analyze a given massive data set and derive or discover some knowledge from it, has been becoming a quite important subject in several fields including computer science. Good softwares have been demanded for various knowledge discovery tasks. For such softwares, we often need to develop efficient algorithms for handling huge data sets. Random sampling is one of ...
متن کاملHow Can Computer Science Contribute to Knowledge Discovery?
Knowledge discovery, that is, to analyze a given massive data set and derive or discover some knowledge from it, has been becoming a quite important subject in several fields including computer science. Good softwares have been demanded for various knowledge discovery tasks. For such softwares, we often need to develop efficient algorithms for handling huge data sets. Random sampling is one of ...
متن کاملPractical Algorithms for On-line Sampling
One of the core applications of machine learning to knowledge discovery consists on building a function (a hypothesis) from a given amount of data (for instance a decision tree or a neural network) such that we can use it afterwards to predict new instances of the data. In this paper, we focus on a particular situation where we assume that the hypothesis we want to use for prediction is very si...
متن کاملImportance sampling the Rayleigh phase function.
Rayleigh scattering is used frequently in Monte Carlo simulation of multiple scattering. The Rayleigh phase function is quite simple, and one might expect that it should be simple to importance sample it efficiently. However, there seems to be no one good way of sampling it in the literature. This paper provides the details of several different techniques for importance sampling the Rayleigh ph...
متن کاملبررسی کاربردهای داده کاوی در نظام سلامت
Introduction: Extensive amounts of data stored in medical databases require the development of specialized tools for accessing the data, data analysis, knowledge discovery, and the effective use of the data. Data mining is one of the most important methods. The article sketches the used Data Mining techniques, and illustrates their applicability to medical diagnostic and prognostic problems. ...
متن کامل